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ABSTRACT 


A  Bayesian  updating  procedure  is  proposed  for  filtering  the  process 
parameters  in  the  two-stage  Markovian  constant  variance  model  for  time  varying 
normal  data  in  the  situation  where  the  signal  to  noise  ratio  is  unknown.  A 
forecasting  procedure  is  described  which  yields  the  entire  predictive 
distribution  of  future  observations;  a  numerical  study  involves  an  on-line 
analysis  for  chemical  process  concentration  readings.  A  similar  method  is 
developed  for  Poisson  data  and  applied  to  the  analysis  of  an  industrial 
control  chart. 


AMS  (MOS)  Subject  Classifications:  62M05;  62M20 

Key  Words:  Kalman  filter,  ARIMA  models;  Linear  prediction;  Variance 

components;  Bayesian  updating;  Predictive  distribution;  Quality 
control;  Poisson  observations;  Linear  growth  model;  Box-Jenkins. 
Work  Unit  Number  4  -  statistics  and  Probability 


Sponsored  by  the  United  States  Array  under  Contract  No.  DAAG29-80-C-004 1 


SIGNIFICANCE  AND  EXPLANATION 


Simple  two-stage  Bayesian  models  are  considered  for  time-varying  normal 
and  Poisson  data,  in  the  linear  prediction  situation.  The  normal  model  is 
equivalent  to  an  ARIMA  process  and  posterior  estimates  for  the  process  levels 
and  signal  to  noise  ratio  are  compared  with  the  Box-Jenkins  likelihood 
procedure.  The  posterior  distribution  of  the  variance  ratio  may  be  updated 
on-line  and  the  unconditional  posterior  densities  and  predictive 
distributions,  of  the  process  levels  and  future  observations,  may  be 
calculated  at  each  time  stage,  providing  useful  inference  procedures  for 
filtering  and  forecasting.  The  methodology  is  applied  in  some  detail  to  the 
Box-Jenkins  chemical  process  data.  In  the  Poisson  situations  similar 
procedures  are  developed  using  a  Gamma  approximation  to  one  of  the  updating 
distributions  in  the  model.  This  method  is  illustrated  by  an  analysis  of  an 
industrial  control  chart. 
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A  BAYESIAN  APPROACH  TO  MARKOVIAN  MODELS 
FOR  NORMAL  AND  POISSON  DATA 


Tom  Leonard 
1 .  Introduction 

Following  Kalman  [5],  Blight  [1],  and  Harrison  and  Stevens  [4],  we 
consider  the  two-stage  Markovian  (constant  variance)  model 


y .  =  0  .  +  6 . 
l  l  l 


9i  = 


i-1 


+  e , 


(1.1  ) 

(1.2) 


(x  —  1f2f»»«|in) 

where  the  0^  represent  the  process  parameters  and  the  6^  and  are 

mutually  independent  and  normally  distributed  error  terms  with  zero  means. 

2 

The  6^  and  are  taken  to  possess  respective  common  variances  r  and 

I 

2 

a  ,  with  the  exception  of  e,  which  for  convenience  is  taken  to  possess 
2  2 

variance  *  y*  where  Y  is  known.  The  initial  mean  0.)  then  possesses 

.  2 
a  normal  prior  distribution  with  mean  0Q  and  variance  0Q. 

For  i  =  2,...,m  the  model  in  (1.1)  and  (1.2)  is  mathematically 

equivalent  to  an  ARIMA  process,  of  the  type  discussed  by  Box  and  Jenkins  ([2], 

p.  8)  where 


-  yi-i  ’  qi  '  5qi-i 


with  the  qi  representing  independent  and  normally  distributed 

-1  2 

variables  with  zero  means  and  common  variance  %  t  ,  and  with 


(1.3) 

random 
£  denoting 


the  smaller  root  of  the  equation 

2  +  o  =  £_1  +  5  , 


(1.4) 


with 


2  2 

a  =  a Vt 
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The  equivalence  may  be  demonstrated  by  differencing  out  the  9^  from 

(1.1)  and  (1.2).  In  Chapter  7,  Box  and  Jenkins  recommend  a  procedure  for  the 

approximate  maximum  likelihood  estimation  of  £.  We  will  later  compare  this 

with  our  inferences  based  upon  the  posterior  distribution  for  a. 

2  2 

From  [4],  we  have  that,  when  T  and  o  are  known,  the  posterior 

distribution  after  time  m  of  the  parameter  8  is  normal  with  mean  a  and 

m  m 

variance  vm.  The  latter  may  be  obtained  from  the  updating  relations 

ai  =  ai_1  +  -  at-1>  0*5) 

(i  *  1 , . . . ,m) 


and 

D±  »  (Di1  +  ®)/(Di_1  +o+1)  (1.6) 

(i  3  2, . . . ,m) 


with  v±  -  T2Di#  a  «  a2/t2,  aQ  -  9Q,  and  D1  -  Y/U+Y). 

Mote  that  aB  provides  a  smoothed  value  after  time  m  for  the  process 

parameter  9  ,  it  is  also  the  j-step  ahead  forecast  for  6  for  any  j  = 
m  m+j 

1,2, ....  i 

2  2 

We  propose  to  extend  these  results  to  the  situation  where  t  and  a 

are  unknown  by  supposing  that  any  prior  information  about  the  variances  may  be 

2  2 

adequately  represented  by  taking  v^/x  and  y>2K2^a  to  P°8sesa 
2 

independent  x  -distributions  with  and  degrees  of  freedom 

respectively.  The  values  1  and  x^1  could  be  specified  as  the  respective 

-2  -2 

prior  means  of  the  precisions  T  and  o  t  and  are  then  prior 

'sample  sixes'  measuring  the  strength  of  the  prior  information.  For  example, 

2 

as  ♦  •  the  first-stage  variance  x  becomes  known  and  equal  to  x  , 
Ignorance  priors  will  be  discussed  in  Section  4>  the  presence  of  prior 
information  is  not  requisite  for  the  practical  applicability  of  our  method. 


2.  Bayesian  Theory. 

Under  the  prior  assumptions  described  in  the  previous  section  the  joint 
2  2  2 

prior  density  of  T  and  u  =  a  /T  is  given  by 


if(T  ,ct)« 


-V2  (v  +v  +2)  -V2  (v  +2) 

(T  )  a  exp{- V2  V1K1/T  -V2v2VaT2} 


(2.1 ) 


(0  <  t  <  «•»  0  <  a  <  •) 


We  seek  the  posterior  distribution  of  the  variance  ratio  a  after  m 

observations  =  (y,...,ya).  It  is  firstly  necessary  to  find  the 

distribution  of  y^  given  T2  and  a. 

For  i  =  we  have  under  the  notation  introduced  in  (1.5)  and 

(1.6),  that  the  conditional  distribution  of  y^  given  y*i-1^,  T2,  and 

2 

a,  is  normal  with  mean  ai_1  and  variance  x  (1  +  o  +  ).  Since  y1 , 

2  2 
given  x  and  a  is  normal  with  mean  0Q  and  variance  x  (1+Y),  the 

required  distribution  of  y^  is 


p(y(m> |t2,o)  =  p(yi |x2,a)  n  p(y.|y(l_1 ) ,t2 ,a) 

i=2 

«(T2)-V2m  u^(a)  exp{-  V2x"2U2(a)> 


(2.2) 


with 


U1  (a) 


(1+Y) 


-1/ 


m 

n 

i*2 


(1  +  a  + 


Di-1> 


-1/ 


(2.3) 


and 


U2(a) 


1/  o  m 

(1+Y)  /2  (X  -  8  )2  +  l 

i-2 


(1+o+  Di_1 )"’ -  ai-i)2  (2*4) 
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n 


where  tie  a^  and  may  be  obtained  from  the  updating  relations  in  (1.5) 

and  (1.6). 

2 

By  Bayes1  theorem,  the  joint  posterior  distribution  of  t  and  o  is 
given  by 


ir(T2,ot|y(m))  «  n(t2,a)p(y<m)  jr2,a) 


(2.5) 


( 0  <  t  <  «m  0  <  a  <  ») 


where  the  first  and  second  contributions  to  the  right  hand  side  are  given  in 

2 

(2.1)  and  (2.2)  respectively.  Integrating  out  t  we  find  that  the  marginal 
posterior  distribution  of  the  variance  ratio  a  is 


n(a|y(m)) 


-  '/2  v 

«  U*(a)  {U*(a)}  T 


(2.6) 


(0  <  a  <  •) 


where 


U*(a)  *  a 


-  V2  <v2> 


U,  («) 


U*(o)  =  vi<i  +  v^/a  +  u2(a) 


v  =  v  +  v_  +  m 
T  1  2 


(2.7) 


(2.8) 


(2.9) 


with  U1 (a)  and  U2(a)  defined  in  (2.3)  and  (2.4)  respectively. 

We  are  now  in  a  position  to  obtain  the  unconditional  posterior  mean  a*, 


of  9^,  after  m  stages.  This  is  given  by  the  expectation 
a*  *  EOjyJ  -  Jo  E(9m|y(m),a)n(a|y(m))do 


(2.10) 


of  the  conditional  posterior  mean  given  a  (i.e.  am  from  (1.5))  with 

respect  to  the  posterior  distribution  of  a,  given  y^m\  in  (2.6).  Note 

that  a*  may  be  calculated  by  a  straightforward  one-dimensional  integration, 
m 

The  above  procedure  possesses  the  appealing  property  that  it  is  easy  to 

update  in  time;  this  aspect  will  be  considered  in  the  next  Section.  For 

vt  >  2  the  posterior  variance  after  m  time  stages  is  equal  to  the 

expectation  of  the  quantity 

(a  -  a*)2  +  (v„  -  2)_1U*(o)d  (2.11) 

mm  T  2  m 

with  respect  to  the  same  distribution  of  a  in  (2.6),  where  a  ,  D  .  U*(a), 

mm2 

and  vt  are  given  in  (1.5),  (1.6),  (2*8),  and  (2.10)  respectively. 

It  is  moreover  possible  to  compute  the  whole  posterior  density  of  0  , 

m 

(m) 

given  y  ,  using 


*(ejy(m))  =  /”  T(0m,a|ylm')do 


(m) 


_  1/  _i  2  “  1/2  (VT+1  } 

n  D  /2U?(a>{Ui(«)  +  D  (0  -  a  )  }  da 

0  m  1  2  mm  m 


(2.12) 


with  U*(a)  and  U*(a)  defined  in  (2.7)  and  (2.8)  respectively. 

2 

The  posterior  density  of  the  first  stage  variance  T  may  be  obtained  by 

integrating  the  quantity  in  (2.5)  with  respect  to  a;  however  when  vt  >  2 

2 

the  posterior  mean  of  T  may  be  obtained  more  simply,  using 

«t2|yl")>  -  Q  E(t2|,(*)..).(a|,<*>)d. 


-  (»T  -  2)"'  /j  0*(ol*(<i|y(m,)da 


(2.13) 


where  the  first  and  second  contribution  to  the  integrand  are  given  in  (2.8) 
and  (2.5)  respectively. 
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Lastly,  it  is  straightforward  to  compute  the  predictive  distribution 


after  m  time  stages  for  a  future  observation  y  ....  Pot  j  =  1,2,...  this 

inxj 

will  possess  mean  a*  in  (2.10),  and  density 

in 


Q  vh1”11";1"1  +  -  »„>2} 


-V2  (%+d 


da 


(2.14) 


(-•  <  y  <  00 ) 
m+j 


where  D  •  =  1  +  ja  +  D  . 

J  m 

Our  approach  provides  us  with  formally  justified  filtering  and 

forecasting  procedures,  and  enables  us  to  make  inferences  at  any  time  stage 

2  2 

about  the  process  parameter  0  ,  the  variances  t  and  o  ,  and  the  future 
observations  ym+j  by  considering  their  posterior  or  predictive 


distributions 


wssmJMwman 


3.  Updating  Procedures 

In  order  to  perform  the  numerical  integration  in  (2.10),  store  am  in 

(1.5)  and  the  distribution  in  (2.6)  for  the  values  of  a  lying  in  a  set 

0  =  {h,2h,3h,...,Ah}  .  (3.1  ) 

The  integer  t  and  width  h  choose  be  chosen  after  balancing 
considerations  of  computer  speed  and  storage  with  the  degree  of  accuracy 
required  in  the  numerical  integration.  With  Simpson's  rule,  t  =  100  and 
h  =  0.01  should  suffice.  We  introduce  the  notation  W(a)  to  denote  the 
expression  on  the  right  hand  side  of  (2.6). 

After  any  time  stage  i  it  is  only  necessary  to  store  the  latest  values 
of  a^,  D^,  U*(a),  U*(a)  and  W(a)  for  each  ct  e  ft.  Previous  values  and 
observations  may  be  discarded.  The  relevant  quantities  are  defined  in  (1.5), 

(1.6) ,  (2.7),  (2.8),  and  (2.6)  respectively.  For  example,  after  time  i  *  1 
we  have 


a.  = 


0O+D1(X1“V  ' 


D.  = 


U*(a) 


Y/( 1+Y) 

-V2  (v  +2) 


a  -  ,  U$«*>  -  Vi  +  V2/a 


and 

-  V2  <vV 

W(o)  =  U*(a){u*(a)} 

Given  the  stored  values  after  time  i  -  1  it  is  straightforward  to 
update  to  new  values  after  reading  in  a  fresh  observation  y^  at  time  i. 

The  validity  of  our  routine  may  be  checked  from  (1.5),  (1.6),  (2.3)  and  (2.4), 
step  (ii)  was  devised  to  minimize  problems  >f  exponent  .al  overflow  when 
calculating  W(a)  for  large  i.  The  following  fou.  steps  should  be  completed 
in  order  for  each  <*  6  R. 
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(i)  Calculate  a  new  value  for  U*(a)  by  dividing  the  old  value  by 

(1  +  °  +  °i-1)1/2* 

(ii)  Calculate  the  quantity 

z.  *  (1  +  a  +  D^rVi  -  Vl)2 

and  obtain  a  new  value  for  W(a)  by  multiplying  the  old  value  by 

->/o 

(1  +  a  +  D±_1 )  /2Qi(zi ) 

with 

1  -1/2<v1+vi) 

Qi(q)  =  {U*(a)}  /2{l  +  q/U*(a)}  .  (3.2) 

(iii)  Calculate  a  new  value  for  U*(ct)  by  adding  z^  to  the  old  value. 

(iv)  Calculate  new  values  a^^  and  Di  from  (1.5)  and  (1.6)  respectively. 

After  each  time  stage  the  function  W(a)  should  be  normalized  by 
dividing  through  by  its  total  over  the  set  ft. 

We  seem  to  have  provided  a  simple  procedure  for  updating,  to  any  required 
accuracy,  the  posterior  distribution  in  (2.6).  Note,  from  (2.10)  that  the 
unconditional  -  nn  a?  of  0^  may  now  be  computed  by  numerically  integrating 
ai  over  a  6  ft,  and  with  respect  to  this  distribution. 

It  is  straightforward  to  employ  the  stored  values  after  time  i  to 
calculate  all  the  means,  variances,  and  distributions  described  in  the 


previous  Section.  It,  for  example,  follows  from  (2.11)  that  the  posterior 
distribution  of  0^  after  time  i  is  proportional  to  the  integral  with 


respect  to  a  of 


W(a)D~  1/2  Qt{D”1  (@i  -  a^  )2} 


(3.3) 


where  Q^(q)  is  defined  in  (3.2).  The  expression  in  (3.3)  may  be  averaged 


over  o  6  ft  for  each  required  value  of  9^.  The  obvious  analogous  procedure 
is  available  for  the  predictive  distribution  in  (2.13). 
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4.  An  On-Line  Analysis  of  Chemical  Process  Readings 


The  data  in  the  second  column  of  Table  1  were  reported  by  Box  and  Jenkins 
(p.  525);  we  have  subtracted  17.0  from  each  observation.  On  p.  239  Box  and 
Jenkins  fit  in  ARIMA  model  in  the  complete  set  of  197  observations.  Their 
identified  model  takes  the  form  given  in  (1.3),  with  S  =  0.70.  This 
corresponds  to  a  value  for  the  variance  ratio  of  ct  =  0.13. 

_  2 

We  firstly  proceed  under  an  assumption  of  prior  ignorance  about  0^  a  , 

2  _  l/_ 

and  T  and  set  Y  =  “  and  hence  D1  =  1  .  The  term  O+Y)  ^  on  the  right 

hand  side  of  (2.3)  should  then  be  removed  together  with  the  first  term  on  the 

right  hand  side  of  (2.4). 

We  select  the  improper  prior  density 

tf(t2,cO«  t2  (0  <  T2  <  »;  0  <  «  <  “)  (4.1) 

2 

for  t  and  a  because  this  particular  choice  ensures  that  the  posterior 
distribution  will  remain  proper  for  an  i  =  1,...,m.  This  is  equivalent  to 
setting  ^  =  <2  =  0  and  replacing  and  by  2  and  -2  respectively 

in  the  analysis  of  Section  3. 

We  obtained,  under  this  ignorance  prior,  the  smooth  values  a£,  for 

0^,  listed  under  (A)  in  the  third  column  of  Table  1;  the  posterior  means  ot 

of  a  are  in  the  fourth  column.  The  smoothed  value  a£  for,  say,  i  =  20, 

was  calculated  from  (2.10)  and  only  depends  upon  the  first  twenty  observations 
20 

y  .  It  is  therefore  the  latest  on-line  value  for  the  process  level  after  20 
stages  of  the  chemical  experiment. 

The  estimate  a  is  initially  rather  large,  causing  the  a*  to,  very 

A  1 

reasonably,  remain  close  to  the  observations.  As  the  process  proceeds,  a 

A 

tends  to  get  smaller,  and  greater  smoothing  ensued.  After  time  m  =  197  our 

values  are  x  =  0. 4,  a*  =  0.49  and  a.  =  0.20.  The  final  posterior  mean  in 
m  m  A 

(2.13)  of  the  first  stage  variance  is  equal  to  0.066.  Our  value  for  a 

A 

fl 


implies  that  this  is  five  times  as  large  as  the  second  stage  variance 
suggesting  that  a  moderately  large  amount  of  noise  in  the  process  is  accounted 
for  by  the  first  stage  fluctuations,  but  that  the  process  parameters  still 
vary  noticeably  with  time. 

The  final  posterior  density  of  o  possessed  a  mode  at  a  -  0.13.  This, 

quite  interestingly,  is  absolutely  identical  to  the  value  recommended  by  Box 

and  Jenkins.  However,  both  the  posterior  distribution  and  likelihood  of  a 

are  positively  skew  with  thick  tails,  suggesting  that  our  recommended  value 

0.20  (i.e.  the  posterior  mean)  might  also  be  plausible. 

He  calculated  the  whole  posterior  density  curve  of  0 ^  from  (2.12)  and 

found  this  to  be  almost  exactly  normal  with  mean  0.49  and  variance  0.022. 

Similarly,  the  predictive  density  curves  for  y198»  y-, 99» •  •  •  ,y202  were  a11 

well-approximated  by  normal  distributions  with  mean  0.49  and  respective 

variances  0.101,  0.114,  0.127,  0.140,  and  0.153.  Note  that  these 

distributions  are  easily  calculated  after  any  time  stage. 

The  analysis  was  repeated  under  an  assumption  of  definite  prior 
2  2 

information  about  x  and  o  .  We  set  =0.05,  K2  ~  0.025,  and 

V1  =  \>2  =  10,  leading  to  a  prior  mean  of  0.63  for  the  variance  ratio  a. 

The  smoothed  values  for  analysis  B  are  listed  in  the  fifth  column  of  Table 

1,  and  the  corresponding  posterior  means  a  for  a  in  the  sixth  column. 

B 

A  possible  advantage  of  a  proper  prior  distribution  for  a  is  that  the 

smoothed  process  settles  down  more  quickly;  smaller  values  are  observed  for 

a  in  the  initial  stages.  The  estimate  a  however  remains  somewhat  larger 
B  B 

than  a,  in  our  example,  in  view  of  the  information  introduced  by  the  prior 
A 

distribution. 
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Table 

1:  An  On-Line  Analysis  of 

Chemical  Process 

Concentration  Readings 

Tine 

Observation 

Smoothed 

o  . 

Smoothed 

Stage  (i) 

<*i> 

Value  A 

A 

Value  B 

«  B 

1 

0.0 

0.00 

5.00 

0.00 

0.63 

2 

-0.4 

-0.33 

5.00 

-0.24 

0.48 

3 

-0.7 

-0.63 

5.25 

-0.47 

0.48 

4 

-0.0 

-0.85 

5.54 

-0.68 

0.49 

5 

0.1 

-0.08 

4.92 

-0.31 

0.39 

6 

-0.1 

-0.11 

4.90 

-0.22 

0.41 

7 

-0.2 

-0.19 

4.83 

-0.21 

0.42 

8 

0.4 

0.28 

4.87 

0.06 

0.43 

9 

0.1 

0.12 

4.72 

0.08 

0.44 

10 

0.0 

0.02 

4.67 

0.04 

0.44 

11 

-0.3 

-0.23 

4.70 

-0.12 

0.44 

12 

0.4 

0.26 

4.35 

0.12 

0.41 

13 

0.2 

0.20 

4.24 

0.15 

0.42 

14 

0.4 

0.35 

4.24 

0.27 

0.43 

15 

0.4 

0.38 

4.22 

0.33 

0.45 

16 

0.0 

0.09 

4.08 

0.17 

0.44 

17 

0.3 

0.25 

3.85 

0.23 

0.44 

18 

0.2 

0.21 

3.74 

0.22 

0.44 

19 

0.4 

0.35 

3.67 

0.30 

0.45 

20 

-0.2 

-0.05 

3.46 

0.07 

0.43 

21 

0.1 

0.07 

3.16 

0.09 

0.43 

22 

0.4 

0.30 

3.11 

0.23 

0.43 

23 

0.4 

0.36 

3.13 

0.31 

0.43 

24 

0.5 

0.45 

3.16 

0.40 

0.44 

25 

0.4 

0.41 

3.05 

0.40 

0.45 

26 

0.6 

0.54 

3.02 

0.49 

0.46 

27 

0.4 

0.44 

2.64 

0.45 

0.46 

28 

0.3 

0.35 

2.80 

0.38 

0.46 

29 

0.0 

0.11 

3.00 

0.20 

0.46 

30 

0.8 

0.56 

2.26 

0.48 

0.42 

31 

0.5 

0.50 

2.02 

0.49 

0.42 

32 

1.1 

0.88 

2.39 

0.77 

0.43 

33 

0.5 

0.62 

1.69 

0.64 

0.41 

34 

0.4 

0.49 

1.70 

0.53 

0.41 

35 

0.4 

0.45 

1.68 

0.47 

0.41 

36 

0.1 

0.25 

1.85 

0.30 

0.42 

37 

0.6 

0.46 

1.44 

0.44 

0.40 

38 

0.7 

0.59 

1.52 

0.56 

0.41 

39 

0.4 

0.48 

1.31 

0.49 

0.40 

40 

0.8 

0.65 

1.23 

0.63 

0.40 

41 

0.6 

0.61 

1.12 

0.61 

0.40 

42 

0.5 

0.55 

1.06 

0.56 

0.40 

43 

-0.5 

•0.06 

1.80 

0.09 

0.39 

44 

0.8 

0.43 

0.47 

0.41 

0.31 

45 

0.3 

0.37 

0.40 

0.36 

0.31 

46 

0.3 

0.35 

0.39 

0.34 

0.31 

47 

0.1 

0.26 

0.39 

0.24 

0.31 

48 

0.4 

0.32 

0.35 

0.31 

0.31 

49 

-0.1 

0.16 

0.35 

0.14 

0.31 

50 

0.3 

0.22 

0.31 

0.21 

0.31 
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Table  1  (continued) 

Time 

Observation 

Smoothed 

Smoothed 

Stage  (i) 

<\> 

Value  A  A 

Value  B 

-0.48 

-0.52 

-0.63 

-0.62 

-0.65 

-0.63 


-0 

-0 

-0 

-0 

-0 

-0 


Table  1  (continued) 


Time 

Observation 

Smoothed 

«A 

Smoothed 

«B 

Stage  (i) 

(x^) 

Value  A 

Value  B 

101 

-0.5 

-0.26 

0.18 

-0.27 

0.26 

102 

0.2 

-0.11 

0.17 

-0.09 

0.25 

103 

-0.6 

-0.27 

0.16 

-0.29 

0.25 

104 

0.0 

-0.18 

0.16 

-0.18 

0.24 

105 

0.0 

-0.13 

0.16 

-0.11 

0.24 

106 

-0.3 

-0.18 

0.15 

-0.18 

0.24 

107 

-0.8 

-0.37 

0.16 

-0.42 

0.24 

108 

-0.4 

-0.38 

0.16 

-0.41 

0.24 

109 

-0.1 

-0.29 

0.15 

-0.29 

0.24 

110 

-0.5 

-0.35 

0.15 

-0.37 

0.24 

111 

-0.4 

-0.37 

0.15 

-0.38 

0.24 

112 

-0.4 

-0.38 

0.15 

-0.39 

0.24 

113 

0.0 

-0.26 

0.14 

-0.24 

0.24 

114 

0.1 

-0.15 

0.15 

-0.11 

0.24 

115 

0.1 

-0.08 

0.16 

-0.03 

0.25 

116 

-0.3 

-0.15 

0.15 

-0.14 

0.24 

117 

-0.2 

-0.17 

0.15 

-0.16 

0.24 

118 

-0.7 

-0.33 

0.15 

-0.36 

0.24 

119 

-0.4 

-0.35 

0.15 

-0.38 

0.24 

120 

-0.2 

-0.30 

0.14 

-0.31 

0.24 

121 

-0.1 

-0.24 

0.14 

-0.23 

0.24 

122 

0.1 

-0.13 

0.14 

-0.11 

0.24 

123 

-0.2 

-0.16 

0.14 

-0.14 

0.24 

124 

0.0 

.-0.11 

0.14 

-0.09 

0.24 

125 

0.2 

-0.02 

0.15 

0.02 

0.24 

126 

0.3 

0.08 

0.16 

0.13 

0.25 

127 

0.2 

0.12 

0.16 

0.15 

0.25 

128 

0.3 

0.17 

0.17 

0.21 

0.25 

129 

0.2 

0.18 

0.17 

0.20 

0.25 

130 

0.2 

0.18 

0.17 

0.20 

0.25 

131 

0.5 

0.29 

0.17 

0.32 

0.26 

132 

-0.1 

0.16 

0.16 

0.16 

0.25 

133 

-0.1 

0.08 

0.16 

0.06 

0.25 

134 

-0.1 

0.02 

0.16 

0.00 

0.25 

135 

0.0 

0.02 

0.16 

0.00 

0.25 

136 

-0.5 

-0.15 

0.17 

-0.19 

0.26 

137 

-0.3 

-0.20 

0.18 

-0.23 

0.26 

138 

-0.2 

-0.20 

0.17 

-0.22 

0.26 

139 

-0.3 

-0.23 

0.18 

-0.25 

0.26 

140 

-0.3 

-0.25 

0.18 

0.27 

0.26 

141 

-0.4 

-0.30 

0.18 

-0.32 

0.26 

142 

-0.5 

-0.36 

0.18 

-0.39 

0.27 

143 

0.0 

-0.24 

0.17 

-0.24 

0.26 

144 

-0.3 

-0.26 

0.17 

-0.26 

0.26 

145 

-0.3 

-0.27 

0.17 

-0.28 

0.26 

146 

-0.1 

-0.22 

0.17 

-0.21 

0.26 

147 

0.4 

-0.01 

0.18 

0.03 

0.26 

148 

0.1 

0.02 

0.18 

0.06 

0.27 

149 

0.0 

0.01 

0.18 

0.03 

0.27 

150 

-0.2 

-0.06 

0.18 

-0.06 

0.26 

-13- 


ii 


J 


Tine 

Observation 

Stage  (i) 

(x.) 

151 

0.2 

152 

0.2 

153 

0.4 

154 

0.2 

155 

-0.1 

156 

-0.2 

157 

0.0 

158 

0.4 

159 

0.2 

160 

0.2 

161 

0.1 

162 

0.1 

163 

0.1 

164 

0.4 

165 

0.2 

166 

-0.1 

167 

-0.1 

168 

0.0 

169 

-0.3 

170 

-0.1 

171 

0.3 

172 

0.8 

173 

0.8 

174 

0.6 

175 

0.5 

176 

0.0 

177 

-0.1 

178 

0.1 

179 

0.2 

180 

0.4 

181 

0.5 

182 

0.9 

183 

0.0 

184 

0.0 

185 

0.0 

186 

0.2 

187 

0.3 

188 

0.4 

189 

0.4 

190 

0.0 

191 

1.0 

192 

1.2 

193 

0.6 

194 

0.8 

195 

0.7 

196 

0.2 

197 

0.4 

Table  1 

(continued) 

Smoothed 

a  . 

Value  A 

A 

0.03 

0.18 

0.08 

0.18 

0.19 

0.19 

0.19 

0.18 

0.09 

0.18 

0.01 

0.18 

0.00 

0.18 

0.13 

0.17 

0.15 

0.17 

0.17 

0.18 

0.14 

0.17 

0.13 

0.17 

0.12 

0.17 

0.21 

0.17 

0.21 

0.17 

0.11 

0.17 

0.04 

0.17 

0.03 

0.17 

-0.08 

0.18 

-0.08 

0.18 

0.04 

0.17 

0.29 

0.18 

0.48 

0.21 

0.52 

0.22 

0.51 

0.22 

0.33 

0.20 

0.18 

0.21 

0.15 

0.21 

0.17 

0.21 

0.25 

0.20 

0.34 

0.20 

0.54 

0.22 

0.34 

0.19 

0.23 

0.19 

0.15 

0.20 

0.17 

0.19 

0.22 

0.19 

0.28 

0.19 

0.32 

0.19 

0.21 

0.19 

0.47 

0.18 

0.74 

0.22 

0.68 

0.20 

0.72 

0.21 

0.71 

0.21 

0.53 

0.20 

0.49 

0.20 

Smoothed 

a. 

Value  B 

B 

0.04 

0.26 

0.10 

0.26 

0.22 

0.27 

0.21 

0.27 

0.09 

0.26 

-0.02 

0.27 

-0.01 

0.27 

0.15 

0.26 

0.17 

0.26 

0.18 

0.26 

0.15 

0.26 

0.13 

0.26 

0.12 

0.26 

0.23 

0.26 

0.22 

0.26 

0.09 

0.26 

0.02 

0.26 

0.01 

0.26 

-0.11 

0.27 

-0.11 

0.27 

0.05 

0.26 

0.35 

0.27 

0.53 

0.29 

0.56 

0.29 

0.53 

0.29 

0.32 

0.28 

0.15 

0.29 

0.13 

0.29 

0.16 

0.29 

0.26 

0.29 

0.36 

0.29 

0.58 

0.30 

0.34 

0.28 

0.21 

0.28 

0.12 

0.28 

0.16 

0.28 

0.21 

0.28 

0.29 

0.28 

0.33 

0.28 

0.20 

0.28 

0.52 

0.27 

0.80 

0.29 

0.71 

0.28 

0.75 

0.28 

0.73 

0.28 

0.52 

0.28 

0.47 

0.28 

5.  Poisson  Observations 


Suppose  now  that  we  replace  the  first  stage  of  the  model  in  (1.1)  by  the 
assumption  that  y1#  y2»***»ym»  are  independent  and  Poisson  distributed  with 
respective  means  0^,  02*,,**®m*  T*ie  might  for  example  represent  the 

numbers  of  items  per  successive  batch  by  an  industrial  process.  A  similar 
analysis  to  the  one  described  below  may  be  developed  if  y^  instead  possesses 
a  binomial  distribution  with  probability  6^  and  sample  size  n^  (e.g. 
ni  =  1  for  binary  process). 

In  the  Poisson  situation  we  retain  a  similar  second  stage  to  the  model  by 
supposing  that 

8i  -  8i_1  +  ei  (i  -  1, (5.1) 

where  e  ....,£  are  independent  terms  with  zero  means.  For  i  =  2,...,m  we 
l  m 

suppose  that  £^  possesses  variance  <*0^  ^ ,  so  that  the  variance  changes 

with  the  non-negative  mean  in  a  sensible  (and  technically  convenient) 

manner.  The  first  error  term  is  taken  to  possess  variance  Y0Q»  so  that 

01  has  a  prior  distribution  with  mean  8^  and  variance  Y0q.  No  further 

distributional  assumptions  need  to  be  made  about  e  ,...,£  until  we  present 

I  in 

ourselves  with  the  problem  of  estimating  a. 
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6.  Linear  Prediction 

Let  8*  denote  our  smoothed  value  for  8  ,  given  y^m\  and  for  j  = 
n#  0  m  «• 

1,2,...  let  8*  ,  represent  our  j-step  ahead  forecast  for  6  after 
®  #  3  m+j 

time  m.  Attention  is  for  the  moment  restricted  to  linear  predictors  of  the 
form 


9:,j  -  6oj  *  J,  Vi 


(6.1  ) 


( j  *  0, 1 ,2,...) 


where  8  ,...,8  .  represent  unknown  constants  not  depending  upon  the  data, 

J 

and  a  is  initially  taken  to  be  known.  Optimal  values  for  8_.,. ..,8  .  may 

0}  m] 

be  obtained  by  minimizing  the  Bayes  risk  of  8*  ,  under  the  quadratic  loss 

m,  j 

function 


L(8*  . ,0  . )  -  (8*  .  -  8  . )' 

m,j  m+j  m,j  m+j 


(6.2) 


The  first  two  moments  of  the  joint  distribution,  given  o,  of  y1#...,y 
and  8j,...,8  *-a  described  in  Section  5,  and  the  Bayes  risk  may  be  obtained 

by  taking  expectation  of  the  loss  function  in  (6.2)  with  respect  to  this  joint 
distribution.  Setting  8^  -  8^  for  i  «  0,1,..., m  and  j  -  0,  1,2,..., 
for  notational  simplicity  we  find  after  some  manipulation  that  the  Bayes  risk 
is  given  by 


■  (Vo  ♦•••*  8.eo  *  9o  -  v2 

♦  j«80 

+  (8?  +•••+  82)8 

i  mo 

(6.3) 

+  <*{(8  -  I)2  +  (8  +8  -  l)2  +•••+  (8  +•••+  8,  -  1 ) 2}9 

®  in  m-i  m  2  0 

+  y(b  +•••+  8,  -  i } 2e  . 
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Differentiating  the  expression  in  (6,3)  with  respect  to  the  we  find  that 

the  optimal  values  of  B^  do  not  depend  upon  j.  The  optimal  linear  values 

9*  .  in  (6.1)  may  therefore  be  set  equal  to  a  common  quantity  a  for  all 

m,]  “ 

j  =  0,  . 

We  now  introduce  an  analogy  with  the  normal  process  of  Section  1  in  order 
to  obtain  updating  formulae  for  am.  It  is  straightforward  to  show  that  this 
normal  process  provides  exactly  the  same  Bayes  risk  as  given  in  (6.3),  but 

with  0Q,  in  all  but  the  first  term  on  the  right  hand  side  of  (6.3),  replaced 

2  2  2. 
by  the  first  stage  variance  t  and  a  replaced  by  o  / T  .  The  optimal 

linear  predictors  in  the  Poisson  case  are  therefore,  with  appropriate 

substitutions  identical  to  those  in  the  normal  case.  However  the  optimal 

linear  predictors  0*  .  in  the  normal  case  are  identical  to  the  posterior 

m,D 

mean  am,  as  normality  implies  linearity .  We  therefore  have  the  pleasing 
result  that  affl  in  the  Poisson  situation  may  be  obtained  from  exactly  the 
same  updating  formulae  as  employed  in  Section  1,  namely 


a.  =  a.  , 

l  1-1 


+  D. (y.  -  a.  ) 
i  i  1-1 


(6.4) 


(i  —  1 , . » . ,m) 


and 


=  (Di_1  +  «)/(Di_1  +  a  +  1) 


(6.5) 


with  aQ  =  0Q  and  D1  =  Y/(1+Y). 

The  value  am  could  be  viewed  as  the  'best'  linear  approximation  to  the 


posterior  mean  of  9  after  time  m.  In  the  next  Section  we  show  how  this 

m 


result  may  be  approximately  generalized  to  the  situation  where  a  is  unknown 


7.  Estimation  of  a 


In  order  to  obtain  an  approximation  to  the  posterior  distribution  of  a 
we  introduce  an  alternative  method  which  also  approximately  justifies  the 
updating  formulae  in  (6.4)  and  (6.5).  Suppose  that  after  i-1  time  stages  the 
posterior  distribution  of  given  o,  possesses  mean  and  variance 


3i-ll? 


(i-1) 


~  (ai-i'Vi> 


(7.1  ) 


Then 


0i|y(l_1),a  ^  (ai_1,vi_1  +  oai_1)  .  (7.2) 

Suppose  that  the  distribution  in  (7.2)  is  approximately  Gamma  with  the 

stipulated  mean  and  variance.  As  9.  in  Poisson  with  mean  9  we  obtain. 

i  i 

applying  Bayes  theorem  and  some  manipulation 

e1|y(i)»®  ~  (ai'vi)  (7.3) 

where 


and 


Vi  +  vi‘Wi> 


(7.4) 


V 


i 


+  "l-(  1 


Setting  ®  yields  the  relations  in  (6.4)  and  (6.5).  The 
approximations  also  give 

yi  |y(i_1 3  ~  +  °  +  1  )»i_1 )  (7.5) 
and  that  this  distribution  is  approximately  a  Gamma  mixture  of  a  Poisson.  The 
density  is  given  by 


ai— 1 9i— 1 


r(Vi,r(4i.iV,)(1  +  w 


\-1gi-1  ♦  *i 


(7.6) 
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where 


g.  =  (D.  +  o) 

^i-1  1-1 

The  posterior  density  of  a  after  m  time  stages  is  therefore 


(7.7) 


approximated  by 


t(a|y*mS  *  ir(a)  II  p(y.  |y^ 
i=1  " 


m  a.  g .  ,  yi-1  fa.  ,g.  .  + 

'«>  "  «W>  *  *i-,» 1  1  ■  (  ■;  fr  ,  ) 


(7.8) 


(0  <  a  <  •) 

where  x(a)  denotes  the  prior  density;  we  recommend  the  flexible  choice 


w (a)*  a 


V2  V2 


_1/2  (V1+V2) 


(^•e  +  v,a' 


(7.9) 


so  that  <  a  possesses  an  F-distribution  with  and  degrees  of 


freedom.  For  >  2  the  prior  mean  of  o  is  then  equal  to  v2k/(v2-2>. 

The  unconditional  mean  a*  and  variance  v*  of  S  after  m 

m  mm 

observations  may  be  approximated  by  taking  expectations  with  respect  to  the 

distribution  in  (7.8)  of  am  in  (6.4)  and 

(a  -  a*)2  +  D  a 
mm  mm 

The  predictive  mean  of  any  future  observation  Ym+j  is  also  approximated 

by  a*  and  the  predicted  variance  of  ym+j  may  be  approximated  by  taking 

expectations  with  respect  to  the  distribution  in  (7.8)  of  the  quantity 

(a  -  a*)2  +  (1  +  jo  +  D)a  . 
mm  mm 

It  may  be  reasonable  to  take  the  unconditional  predictive  distributions 


of  the  ym+j  to  be  approximately  Poisson-Gamma  with  the  means  and  variances 


described  above. 


Updating  in  the  Poisson  situation  is  just  as  straightforward  as  for  the 
method  for  normal  observations  described  in  Section  3.  After  any  time  stage 
i  it  is  necessary  to  store  the  values  a^,  D^,  and  W(a)  for  all  a  lying 


in  8  in  (3.1),  where  W(a)  denotes  the  expression  in  the  right  hand  side  of 
(7.8). 


Given  the  stores  values  after  time  i-1  we  update  to  new  values  after 
time  i  by  multiplying  the  old  value  of  W(o)  by  the  expression  in  the  right 
hand  side  of  (7.6),  and  using  (6.4)  and  (6.5)  to  update  a^  and  D^. 
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9.  An  Analysis  of  an  Industrial  Control  Chart 

The  52  Poisson  observations  in  the  second  column  of  Table  2  were 

introduced  by  Hald  ([3],  p.  720),  and  represent  numbers  of  defective  items  in 

consecutive  shifts  of  an  industrial  process. 

The  smoothed  values  (A)  in  the  third  column  correspond  to  a  uniform  prior 

distribution  for  a  (set  v  =2,  =  -2,  <  *  0,  and  Y  =  ®  in  the 

analysis  of  Section  7).  The  posterior  distribution  of  a  after  52 

observations  possessed  a  mode  at  a  =  0  and  a  very  thick  tail.  The  value 

am  =  0.05  suggests  that  whilst  the  random  noise  in  the  process  is  primarily 
A 

caused  by  the  first-stage  fluctuations  of  the  Poisson  observations  y^^  about 
their  mean  0^,  there  is  some  definite  evidence  that  the  0^  are  changing  in 
time.  The  final  smoothed  value  of  2.93  may  be  compared  with  the  average 

3.23  of  the  52  observations.  The  predictive  variance  of  x53  is  about 

1.24  times  its  predictive  mean  of  2,93,  suggesting  that  the  predictive 
distribution  of  x53  is  not  approximately  Poisson.  A  Poisson-Gamma 
distribution  with  this  mean  and  variance  might  be  more  reasonable. 

We  also  employed  the  F-distribution  in  (7.9)  as  prior  for  a.  The 


choices  v 


y>2  -  10,  ic  ■  0.20,  and  Y  ■  "  lead  to  a  prior  mean  of  0.25 


for  a.  The  corresponding  smoothed  values  (B)  are  listed  in  the  fifth  column 

of  Table  2;  the  estimates  a  in  the  sixth  column  fluctuate  rather  less  than 

0 

the  a^  in  the  fourth  column.  The  final  posterior  distribution  of  a 
possessed  a  mode  at  a  =*  0.07,  its  mean  at  a  *  0.10,  and  a  thick  positive 
tail. 
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Table  2;  Analysis  of  an  Industrial  Control  Chart 


Time 

Observation 

Smoothed 

Stage  (i) 

(*i> 

Value  A 

1 

3 

3.00 

2 

1 

1.81 

3 

0 

0.90 

4 

7 

3.89 

5 

3 

3.39 

6 

4 

3.63 

7 

4 

3.76 

8 

4 

3.83 

9 

5 

4.29 

10 

3 

3.72 

11 

2 

3.03 

12 

3 

3.06 

13 

3 

3.07 

14 

5 

3.76 

15 

1 

2.77 

16 

2 

2.55 

17 

2 

2.41 

8 

5 

3.28 

19 

2 

2.87 

20 

5 

3.49 

21 

2 

3.04 

•  22 

1 

2.47 

23 

3 

2.68 

24 

2 

2.53 

25 

3 

2.68 

26 

5 

3.23 

27 

1 

2.69 

28 

3 

2.79 

29 

0 

2.17 

30 

2 

2.19 

31 

6. 

3.00 

32 

5 

3.38 

33 

9 

5.00 

34 

4 

4.55 

35 

4 

4.32 

36 

4 

4.19 

37 

4 

4.11 

38 

6 

4.54 

39 

0 

3.47 

40 

7 

4.25 

41 

3 

3.96 

42 

4 

3.96 

43 

4 

3  96 

44 

1 

3.37 

45 

2 

3.11 

46 

3 

3.13 

47 

3 

3.14 

48 

1 

2.70 

49 

2 

2.59 

50 

3 

2.72 

51 

6 

3.36 

52 

1 

2.93 

°A 

Smoothed 

a 

A 

Value  B 

°B 

0.50 

3.00 

0.25 

0.50 

1.89 

0.24 

0.49 

1.07 

0.24 

0.51 

3.44 

0.25 

0.48 

3.24 

0.24 

0.46 

3.50 

0.23 

0.43 

3.66 

0.22 

0.41 

3.77 

0.22 

0.39 

4.19 

0.21 

0.36 

3.76 

0.20 

0.35 

3.16 

0.20 

0.32 

3.12 

0.19 

0.29 

3.09 

0.19 

0.26 

3.72 

0.18 

0.27 

2.82 

0.18 

0.25 

2.57 

0.18 

0.24 

2.40 

0.17 

0.20 

3.24 

0.16 

0.18 

2.84 

0.16 

0.17 

3.51 

0.15 

0.15 

3.04 

0.15 

0.17 

2.41 

0.15 

0.14 

2.61 

0.14 

0.13 

2.43 

0.14 

0.11 

2.61 

0.14 

0.10 

3.31 

0.13 

0.10 

2.63 

0.13 

0.09 

2.74 

0.13 

0.09 

1.96 

0.13 

0.08 

1.99 

0.12 

0.07 

3.12 

0.12 

0.07 

3.64 

0.12 

0.14 

5.30 

0.15 

0.12 

4.84 

0.14 

0.11 

4.57 

0.13 

0.10 

4.39 

0.13 

0.09 

4.27 

0.13 

0.09 

4.76 

0.12 

0.09 

3.40 

0.12 

0.08 

4.42 

0.12 

0.07 

4.03 

0.11 

0.07 

4.02 

0.11 

0.06 

4.02 

0.11 

0.07 

3.19 

0.11 

0.07 

2.87 

0.11 

0.06 

2.93 

0.11 

0.06 

2.96 

0.11 

0.07 

2.43 

0.11 

0.07 

2.33 

0.11 

0.06 

2.53 

0.11 

0.05 

3.45 

0.10 

0.05 

2.80 

0.10 

l 
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